<<<<<<< HEAD ======= >>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98 SignalingProfiler tutorial

Contents

library(DT)
library(ggplot2)
library(xfun)
library(SignalingProfiler)
library(tidyverse)

Introduction

SignalingProfiler 2.0 is a modelling pipeline described in Venafra et al., 2024.

The SignalingProfiler package provides a flexible way to create signaling networks integrating transcriptomic, proteomic and phosphoproteomic data.

Briefly, the method allows the user to build a mechanistic model representing the signal remodeling upon a treatment or in a disease condition. The model starts with perturbed receptor(s) and ends with proteins whose activity is modulated (up- or down-regulated) among your two conditions (e.g. treated vs untreated, disease vs healthy).

The SignalingProfiler pipeline is modular. You can exploit the method to just perform one of the proposed tasks.

The pipeline is divided in three main steps:

At the end of the pipeline, you can extract functional circuits connecting perturbed nodes to crucial phenotypes for your context.

1 Installation

<<<<<<< HEAD
devtools::install_github('https://github.com/SaccoPerfettoLab/SignalingProfiler/')

1.1 Prerequisites:

1.1.1 ILP Solver

=======

devtools::install_github('https://github.com/SaccoPerfettoLab/SignalingProfiler/')

1.1 Prerequisites: ILP Solver

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

The most important prerequisites for SignalingProfiler is the ILP solver required by CARNIVAL algorithm to optimize your model against experimental data. An exhaustive guide about installation can be found in Prerequisites section of CARNIVAL GitHub page here.

If you want to create networks with SignalingProfiler install the ILP solver

<<<<<<< HEAD

1.1.2 Python configuration

To run the last step of SignalingProfiler is required to properly configure the reticulate environment. To do so, run the following command the first time you import SignalingProfiler.

install_sp_py()
======= >>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

2 SignalingProfiler input

SignalingProfiler requires processed tables derived from your multi-omics data. Here, we describe the information needed by the package.

2.1 Transcriptomics and Proteomics

Your transcriptomics and proteomics tables should contain the following columns:

  • gene_name and Uniprot ID for proteomics; SignalingProfiler is structured having the gene_name as key;

  • difference: representing the fold-change in gene expression among two conditions; e.g. the difference between treated and control sample;

The next columns regards parameters linked to the statistical test performed among your conditions:

  • logpval: the -Log(p-value) associated to your fold-change;

  • significant: a column containing a ‘+’ if the gene is significantly modulated, NA otherwise;

Transcriptomics data example

<<<<<<< HEAD
data("tr_toy_df")
DT::datatable(tr_toy_df, options = list(pageLength = 3))
=======

data("tr_toy_df")
DT::datatable(tr_toy_df, options = list(pageLength = 3))
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

Proteomics data example


data("prot_toy_df")
DT::datatable(prot_toy_df, options = list(pageLength = 3))
<<<<<<< HEAD
=======
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

2.2 Phosphoproteomics

Before using Signaling Profiler, you have to accurately process you phosphoproteomic data set. The most important step is to have ONLY ONE OCCURENCE OF EACH PHOSPHOSITE. If you have a phosphosite quantified more than once, you have to choose the most reliable quantification (usually the lowest multiplicity).

Your phosphoproteomics table should contain:

  • UniProt ID and gene_name: reporting the UNIPROT ID and the gene name respectively; again, choose just one gene name.

  • aminoacid and position: reporting the phosphosite with the single letter notation and the position in the protein sequence;

  • sequence window: the phosphopeptide centered on the modified residue; it should be at least a 15-mer, compatible with SIGNOR and PsP database notation;

  • difference: representing the fold-change in gene expression among two conditions; e.g. the difference between treated and control sample;

  • logpval: the -Log(p-value) associated to the statistical test assessing the significance of your comparisons;

  • significant: a column containing a ‘+’ if the gene is significantly modulated, NA otherwise;

Phosphoproteomics data example


data("phospho_toy_df")
DT::datatable(phospho_toy_df, options = list(pageLength = 3))
<<<<<<< HEAD
=======
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

3 SignalingProfiler output

The output of SignalingProfiler is a list composed of different elements toy_sp_output:

  • Network in igraph format toy_sp_output$igraph_network

  • Table of the proteins (nodes) included in the network with attributes toy_sp_output$nodes_df

    • Protein identifiers: UniProt ID and gene name;

    • Two different activity scores:

      • final_score: it is the activity derived from experimental data;
      • carnival_activity: it is the activity computed by CARNIVAL algorithm ranging between -100 and +100; usually it has the same sign as the final_score, but sometimes CARNIVAL changes the node activity because the neighbour proteins suggest the opposite activity sign;
      • discordant: it is TRUE if final_score and carnival_activity have opposite sign;
    • Molecular function annotation (mf);

data("toy_sp_output")
DT::datatable(toy_sp_output$nodes_df, options = list(pageLength = 3))
<<<<<<< HEAD
=======
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98
  • Table of edges included in the network with biologically relevant attributes toy_sp_output$edges_df

    • Uniprot IDs of the nodes connected (source and target);

    • sign reporting if the interaction is activatory (1) or inhibitory (-1);

    • carnival_weight: since CARNIVAL may return more than one solustion, this attribute reports the frequency of the edge in the different solutions returned (e.g. 100 means that the edge is present in all the solutions!);

    • annotation from phosphoproteomics data:

      • aminoacid: reports the phosphosites in SIGNOR and PsP matching that interaction; if the phosphosite has a * it means that it is significantly modulated among your two conditions;

      • FC: reports the fold-change of phosphorylation among your two conditions of the phosphosite matched on the interaction;

      • is quantified is TRUE if you quantified the phosphosite matching that edge in your phosphoproteomic data set;

      • is significant is TRUE if the quantified phosphosite matching that edge in your phosphoproteomic data set is significantly modulated;


data("toy_sp_output")
DT::datatable(toy_sp_output$edges_df, options = list(pageLength = 3))
<<<<<<< HEAD
=======
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

4 S1. Infer protein activities from multi-omics data

SignalingProfiler allows the user to infer protein activities from omic data comining:

  • Footprint-based analysis: infer the activity of a transcription factor (or kinases and phosphatases) from the modulation of its targets (transcripts/phosphosites);

  • Phosphoscore analysis: infer the activity of phosphoproteins from the modulation of their regulatory phosphosites;

4.1 Footprint-based analysis

Footprint-based analysis derives protein activities from the modulation of downstream targets using the VIPER algorithm. In particular:

  • TFEA (Transcription Factor Enrichment Analysis) infers transcription factors activity from the modulation of target genes in transcriptomics data;

  • KSEA (Kinase Substrates Enrichment Analysis) infers kinases/phosphatases activity from the modulation of target phosphosites in phosphoproteomics data;

The set of regulated analytes of a TF/KIN/PHOS is called regulon.

4.1.1 Regulons sources

SignalingProfiler built-in regulons

  • TFs-genes regulons (tfea_db_human, tfea_db_human_collectri) are retrieved from DoRothEA R Package (confidence: A) and from SIGNOR or from Collectri, a new comprehensive resource.

  • KIN/PHOS-phosphosites regulons (ksea_db_human) are derived from:OmniPath and SIGNOR; given the recent publication of the Serine-Threonine-Tyrosine Kinome Atlas we integrated high confidence regulons in the ksea_db derived from this work. If you want to extend your data with the atlas, set integrated_regulons = TRUE.

Create you own TFEA/KSEA custom regulons

SignalingProfiler 2.0 updates its regulons (regulatory networks) twice a year. If the user want to use updated regulons, we implemented functions taking the information from different resources and making them SignalingProfiler compliant.


tfea_regulons <- create_tfea_regulons(resources = c('SIGNOR', 'Dorothea'),
                                      organism = 'human')
write_tsv(tfea_regulons, '../custom_regulons/tfea_custom_regulons.tsv')

ksea_regulons <- create_ksea_regulons(resources = c('SIGNOR', 'Omnipath'), 
                                      organism = 'human')
write_tsv(ksea_regulons, '../custom_regulons/ksea_custom_regulons.tsv')

4.1.2 Transcription factors’ activity inference


tf_activity_foot <- run_footprint_based_analysis(
  omic_data = tr_toy_df, 
  analysis = 'tfea', 
  organism = 'human', 
  reg_minsize = 10, 
  exp_sign = FALSE,
  collectri = FALSE,
  hypergeom_corr = TRUE,
  GO_annotation = TRUE,
  correct_proteomics = TRUE, 
  prot_df = prot_toy_df, 
  custom = FALSE,
  custom_path = NULL)

# Set custom = TRUE and specify your custom regulons path in 
# custom_path = '../custom_regulons/tfea_custom_regulons.tsv', 
# if you have custom regulons!

4.1.3 Kinases/Phosphatases activity inference


kin_phos_activity_foot <- run_footprint_based_analysis(
  omic_data = phospho_toy_df, 
  analysis = 'ksea', 
  organism = 'human', 
  reg_minsize = 5, 
  exp_sign = FALSE, 
  integrated_regulons = TRUE, 
  hypergeom_corr = TRUE,
  GO_annotation = TRUE, 
  correct_proteomics = TRUE, 
  prot_df = prot_toy_df, 
  custom = FALSE,
  custom_path = NULL)

# Set custom = TRUE and specify your custom regulons path in 
# custom_path = '../custom_regulons/tfea_custom_regulons.tsv', 
# if you have custom regulons!

4.2 PhosphoScore analysis

PhosphoScore computation exploits:

  • the experimental fold-change of the phosphosites in phosphoproteomics;

  • the regulatory role of phosphosites (activating or inhibiting) annotated in SIGNOR and PsP databases;

4.2.1 Regulatory phosphosites sources

<<<<<<< HEAD

The information about how phosphosites regulate the protein they are on is retrieved from SIGNOR and PsP databases (good_phos_df_human_act, good_phos_df_human_act_aapos). The key for identifying the phosphosite can be either Gene-Peptide or Gene-AAPos.

=======

The information about how phosphosites regulate the protein they are on is retrieved from SIGNOR and PsP databases (good_phos_df_human_act, good_phos_df_human_act_aapos). The key for identifying the phosphosite can be either Gene-Peptide or Gene-AAPos.

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

SignalingProfiler updates its regulons (regulatory networks) twice a year. If the user want to use updated information it can create its own information about regulatory phosphosites. Importantly, if you want to include PsP information you have to manually download from PsP database the Regulatory_sites and Kinase_Substrates file.


reg_phosphosites <- get_phosphoscore_info(resources = c('SIGNOR', 'PsP'), 
                      organism = 'human', 
                      psp_reg_site_path = '.Regulatory_sites',
                      psp_kin_sub_path = '.Kinase_Substrates',
                      aa_pos = FALSE, # TRUE if you want Gene-AA-Pos as key
                      only_activatory = TRUE)

write_tsv(reg_phosphosites, '../custom_regulons/act_phosphosites.tsv')

4.2.2 Infer the activity of phosphorylated proteins


phosphoscore_df <- phosphoscore_computation(phosphoproteomic_data = phospho_toy_df, 
                                            organism = 'human', 
                                            activatory = TRUE , 
                                            GO_annotation = TRUE,
                                            custom = FALSE, 
                                            custom_path = NULL)
# Set custom = TRUE and specify your custom phosphosites path in 
# custom_path = '../custom_regulons/act_phosphosites.tsv', 
# if you have custom phosphosites!

4.3 Final score computation

Transcription factors, kinases and phosphatases could have an activity derived from both their targets (transcripts/phosphosites) and from their regulatory phosphosites. The final score computation integrates these two scores. Importantly, the computation should be done separately for transcription factors and kinases/phosphatases.


# for transcription factors
combined_tf <- combine_footprint_and_phosphoscore(
  footprint_output = tf_activity_foot,
  phosphoscore_df =  phosphoscore_df, 
  analysis =  'tfea')

toy_tf <- combined_tf

# for kinases and phosphatases
combined_kin_phos <- combine_footprint_and_phosphoscore(
  footprint_output = kin_phos_activity_foot,
  phosphoscore_df =  phosphoscore_df, 
  analysis =  'ksea')

toy_kin <- combined_kin_phos

Then, consider the phosphorylated proteins of the PhosphoScore technique that are not TF/KIN/PHOS as OTHER:


toy_other <- phosphoscore_df %>%
  dplyr::filter(mf == 'other') %>%
  dplyr::rename(final_score = phosphoscore) %>%
  dplyr::mutate(method = 'PhosphoScore')

Then, combine all the information in a final activity table for the next steps.


toy_prot_activity_df <- dplyr::bind_rows(toy_tf, toy_kin, toy_other) %>%
  select(UNIPROT, gene_name, mf, final_score, method)

4.3.1 Final score table example example


data('toy_prot_activity_df')
DT::datatable(toy_prot_activity_df, options = list(pageLength = 3))
<<<<<<< HEAD
=======
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

4.4 Example of Step 1 result


toy_prot_activity_df$mf <- factor(
  toy_prot_activity_df$mf, 
  levels = c('tf', 'kin', 'phos', 'other'))

ggplot(toy_prot_activity_df,
       aes(x = fct_reorder(gene_name, final_score), y = final_score)) +
  geom_bar(stat = "identity", aes(fill = final_score), alpha = 1) +
  facet_grid(mf ~ ., space ='free_y', scales = 'free') +
  theme_classic() +
  scale_fill_gradient2(low = "#89ABD6", high = "#FF6A72",
                       mid = "whitesmoke", midpoint = 0) +
  theme(text = element_text(size= 6),
        axis.text.x = element_text(angle = 0, vjust = 1, size = 5),
        legend.position="bottom",
        legend.title = element_blank()) +
  xlab('') +
  coord_flip()+
  ylab('Final score') 
<<<<<<< HEAD

=======

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

SignalingProfiler offers different techniques to derive the activity of proteins. It is up to you and the data you have the choice of the methods and the combination of methods to use. Importantly, the analysis is flexible and powerful. For example, if you have only phosphoproteomics data, you can still infer the activity of transcription factors using their regulatory phosphosites in the PhosphoScore analysis!!

At the end of Step 1, SignalingProfiler outputs a single table listing all proteins with their inferred activity and molecular function, each represented by a unique combined score from one or both methods.


5 S2. Connect inferred proteins in a causal network

In this step, SignalingProfiler connects the proteins of Step1 to user-specified starting point(s) (e.g., signaling receptors). If you do not have a starting point, you can still use SignalingProfiler using the inverseCARNIVAL algorithm.

5.1 Prior Knowledge Network

To connect the proteins, SignalingProfiler uses the prior information about molecular interactions stored in SIGNOR and PhosphoSitePlus databases. Briefly, every PKN interaction is binary, directed (has a regulator and a target of the regulation), and signed (representing either an up- or a down-regulation).

The information is stored in different built-in objects that can contain only direct interactions (PKN_human_dir) or the addition of indirect transcriptional regulations (PKN_human_ind).

<<<<<<< HEAD

Custom PKN SignalingProfiler updates its prior knowledge network twice a year. We offer the user an interface to Omnipath and SIGNOR to assemble an updated PKN in SignalingProfiler compliant format. Remember, if you want to include PsP you have to provide its manually downloaded files!

=======

Custom PKN SignalingProfiler updates its prior knowledge network twice a year. We offer the user an interface to Omnipath and SIGNOR to assemble an updated PKN in SignalingProfiler compliant format. Remember, if you want to include PsP you have to provide its manually downloaded files!

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

create_PKN(database = c('SIGNOR', 'PsP'), #'Omnipath is also supported
           direct = TRUE,
           organism = 'human', 
           psp_reg_site_path = './Regulatory_sites',
           psp_kin_sub_path = './Kinase_Substrate_Dataset', 
           file_path =  './custom_regulons/custom_PKN.RDS')

5.1.1 Choose and preprocess the PKN

The first step to build a network is choosing the Prior Knowledge Network (PKN).

SignalingProfiler allow you to use built-in PKNs for mouse and human or to exploit a custom PKN. For the custom PKN you have to provide the path of the file.

For human you can choose between a PKN containing the interactions from only databases, or a PKN containing interactions also derived from the kinome atlas, specifying the with_atlas = TRUE. In our benchmarking it resulted in a network with an artificial number of hubs, but it could be still useful if you are interested in not known phosphorylation events!


PKN_table <- choose_PKN(organism = 'human', 
                        with_atlas = FALSE,
                        direct = TRUE,
                        custom = FALSE,
                        custom_path = NULL)

# If you want to use your custom PKN put custom = TRUE and 
# custom_path = '../custom_regulons/custom_PKN.RDS'

Since PKN interactions come from different experimental systems, a good practice is to keep only the ones involving analytes expressed in your system, so quantified in at least one of your omics data sets.


PKN_expressed <- preprocess_PKN(
  omics_data = list(tr_toy_df, prot_toy_df, phospho_toy_df),
  PKN_table = PKN_table)

5.2 Build a naïve network

To speed up the analysis, we want to restrict the PKN interactome to the neighborhood of the inferred proteins, generating a naïve network. It is a graph connecting your proteins through ALL the possible SHORTEST causal paths of a user-specified length (1 to 4). We don’t consider the causality of the edges, just distance. As such, in the naïve network there may be two paths with same length, but different causal meaning.

SignalingProfiler offers three types of naïve networks, but the benchmarking process revealed the two-layered as optimal. We suggest to explore also the three-layered!


# divide proteins according to the molecular function
kin_phos_other <- toy_prot_activity_df %>% 
  dplyr::filter(mf %in% c('kin', 'phos', 'other'))
tfs <- toy_prot_activity_df %>% 
  dplyr::filter(mf == 'tf')

# create the naïve network
two_layers_toy <- two_layer_naive_network(starts_gn = c('MTOR', 'AMPK'),
                             intermediate_gn = kin_phos_other$gene_name,
                             targets_gn = tfs$gene_name, 
                             PKN_table = PKN_expressed, #or PKN_human_dir
                             max_length_1 = 3, 
                             max_length_2 = 4, 
                             connect_all = TRUE, 
                             rds_path = './two_layers_naive.rds',
                             sif_path = './two_layers_naive.sif')

5.3 CARNIVAL optimization

CARNIVAL algorithm optimizes the naïve network in a context-specific model. This means that CARNIVAL will retrieve from the naive network only the edges coherent with context-specific activity of proteins from the Step 1.

5.3.1 Prepare input for CARNIVAL optimization

Once you have created the naïve network, you have to prepare the result for the next step.


two_layers_toy <- readRDS('./two_layers_naive.rds')

receptor_list <- list('MTOR' = -1, 'AMPK' = 1)

carnival_input_toy <- prepare_carnival_input(two_layers_toy, 
                                             toy_prot_activity_df, 
                                             receptor_list, 
                                             organism = 'human')

DT::datatable(carnival_input_toy, options = list(pageLength = 3))
<<<<<<< HEAD
=======
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

5.3.2 ILP Solver installation

CARNIVAL requires the ILP solver to be installed. We suggest cplex by IBM. A similar performance is observed using the Gurobi solver. To run the analysis you have to specify the name of the solver, so that we can set the default option for the run.


solver = 'cplex'
carnival_options = default_CARNIVAL_options(solver)

Then, you have to manually specify the ILP solver path.


#for Windows
carnival_options$solverPath = "C:/Program Files/CPLEX_solver/cplex/bin/x64_win64/cplex.exe" 

#for macOS
carnival_options$solverPath = '/Applications/CPLEX_Studio221/cplex/bin/x86-64_osx/cplex' 

#for Linux
carnival_options$solverPath = '/opt/ibm/ILOG/CPLEX_Studio2211/cplex/bin/x86-64_linux/cplex'

# In Windows try the Gurobi ILP solver
# carnival_options = default_CARNIVAL_options(
#   solver = "gurobi",
#   gurobi_path = "C:\\Users\\sgual\\Downloads\\win64\\bin\\gurobi_cl.exe")

You can change the options used by the solver (try first with the default).


carnival_options$mipGAP <- 0.05
carnival_options$timelimit <- 3600
carnival_options$cplexMemoryLimit <- 8192

5.3.3 Run two-shots CARNIVAL

To run CARNIVAL algorithm:

  • create a tibble with the starting point(s) of the model (source_df);

  • Create a tibble with the ending points of the model (target_df);

  • Provide the naïve network in SIF format (Step 2 creates the file in your working directory) (naïve_network;

  • Provide the whole inferred proteins tibble (proteins_df);

  • Provide the CARNIVAL options you created in the former step (carnival_options);

  • According to the PKN you used you have to set the direct and with_atlas parameters in the function.

SignalingProfiler offers 4 versions of CARNIVAL algorithm for the optimization. Three version work with a known starting point of the model, one without (inverseCARNIVAL). In the benchmarking, we identified the two-shots CARNIVAL as the optimal way of running the analysis in terms of recapitulated proteins and phosphorylations within the network.

In this vignette, we propose the two-shots CARNIVAL. In this implementation, CARNIVAL:

  1. Connect the pertubed node(s) to kinases, phosphatases and other signaling proteins;

  2. Connect the kinases, phosphatases and other signaling proteins to transcription factors.

  3. Compute the union of the two graphs


# FIRST RUN: RECEPTOR to KIN, PHOS, OTHERS
receptors_df <- carnival_input_toy %>% dplyr::filter(mf == 'rec')

target1_df <- carnival_input_toy %>%
  dplyr::filter(mf %in% c('kin', 'phos', 'other'))

two_layers_toy_df <- readr::read_tsv('./two_layers_naive.sif',
                                 col_names = c('source', 'interaction', 'target'))

output1 <- run_carnival_and_create_graph(
  source_df = receptors_df,
  target_df = target1_df,
  naive_network = unique(two_layers_toy_df),
  proteins_df = carnival_input_toy,
  organism = 'human',
  carnival_options = carnival_options,
  files = FALSE,
  direct = FALSE,
  with_atlas = TRUE)


# SECOND RUN: from KIN, PHOS, OTHERS to TFs
run1_output_nodes <- convert_output_nodes_in_next_input(output1)

source_df <- run1_output_nodes %>%
  dplyr::filter(mf %in% c('kin', 'phos', 'other'))

target2_df <- carnival_input_toy %>%
  dplyr::filter(mf == 'tf')

output2 <- run_carnival_and_create_graph(
  source_df = source_df,
  target_df = target2_df,
  naive_network = unique(two_layers_toy_df),
  proteins_df = carnival_input_toy,
  organism = 'human',
  carnival_options = carnival_options,
  direct = FALSE,
  with_atlas = TRUE,
  files = FALSE)
                                         
# UNION OF RUN1 and RUN2 graphs 
union <- union_of_graphs(graph_1 = output1$igraph_network, 
                         graph_2 = output2$igraph_network, 
                         proteins_df = carnival_input_toy, 
                         files = TRUE, 
                         path_sif = './optimized_network.sif', 
                         path_rds = './optimized_object.rds')

5.3.4 Map phosphoproteomics on CARNIVAL output

After CARNIVAL optimization, SignalingProfiler maps phosphoproteomics data on the signaling edges of the model for a better biological interpretation of the result. In fact, since the network connects protein with different activity modulations, some interactions may represent phosphorylation events responsable for that.


optimized_object_rds <- readRDS('./optimized_object.rds')

toy_opt_network <- expand_and_map_edges(
  optimized_object = optimized_object_rds, 
  organism = 'human',
  phospho_df = phospho_toy_df,
  files = TRUE,
  direct = FALSE, 
  with_atlas = TRUE,
  path_sif = 'optimized_network_validated.sif',
  path_rds = 'optimized_object_validated.rds')

5.4 Example of Step 2 result

<<<<<<< HEAD

In the toy_opt_network dataset, we report an example of the Step 2 result. Here, it is reported the set of edges validated with phosphoproteomics data.

=======

In the toy_opt_network dataset, we report an example of the Step 2 result. Here, it is reported the set of edges validated with phosphoproteomics data.

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

data('toy_opt_network')
DT::datatable(toy_opt_network$edges_df %>% 
                dplyr::filter(is_quantified == TRUE), 
              options = list(pageLength = 5))
<<<<<<< HEAD
=======
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

At the end of Step 2, SignalingProfiler outputs a sign-coherent network connecting inferred proteins from Step 1 with causal interactions from SIGNOR.


6 S3. Connect phenotypes to the final model

To make the model functionally interpretable, SignalingProfiler connects phenotypes to the model proteins and infer the activity of phenotypes integrating the signaling axes in the model contributing to their activation/inhibition.

To this aim, we exploit the ProxPath algorithm to connect SIGNOR phenotypes to the model proteins and the PhenoScore algorithm, to infer the phenotypes activity.

6.1 Crucial concepts of PhenoScore algorithm

In the PhenoScore computation we indirectly connect proteins to the phenotypes. You can choose the path_length between proteins an phenotype, from 2 to 4.

Crucial concepts: - a protein can regulate a phenotype with multiple paths; - the phenotype will be regulated by activating and inhibitory paths;

Here there are the steps of the algorithm:

  1. Select significant close phenotypes from ProxPath: select from the ProxPath table the significantly close proteins next to your phenotypes of interest (key parameters are zscore_threshold, stat)

  2. Remove randomly regulated phenotypes: given your protein list (protein_df), we sample a random protein list with the same number of proteins and compute the number of activating and inhibitory paths over the phenotype of interest from both lists. If the random list produces the same number of paths of your input list over a phenotype of interest, the phenotype is discarded, because it is regulated at random (key parameters are:n_random, pvalue_threshold).

  3. <<<<<<< HEAD
  4. PhenoScore computation: we compute the average of the activity of proteins regulating the same phenotype, with three considerations: (i) if two independent regulators are connected in the network (sp_graph), consider the most downstream (remove_cascade = TRUE); (ii) we weight proteins according to their number of paths over the phenotype (since they are more resilient, node_idx = TRUE); (iii) the activity considered can be both experimentally derived or derived from CARNIVAL algorithm (use_carnival_activity = TRUE). :::

  5. =======
  6. PhenoScore computation: we compute the average of the activity of proteins regulating the same phenotype, with three considerations: (i) if two independent regulators are connected in the network (sp_graph), consider the most downstream (remove_cascade = TRUE); (ii) we weight proteins according to their number of paths over the phenotype (since they are more resilient, node_idx = TRUE); (iii) the activity considered can be both experimentally derived or derived from CARNIVAL algorithm (use_carnival_activity = TRUE). :::

  7. >>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

6.2 ProxPath preprocessing

To make the ProxPath protein-phenotypic relations context-specific, we remove interactions involving proteins not quantified in your proteomics data.


pheno_distances_table <- phenoscore_network_preprocessing(proteomics = prot_toy_df,
                                                          phospho = phospho_toy_df)

6.3 Run PhenoScore

6.3.1 Choose phenotypes

<<<<<<< HEAD

Check the available phenotypes from the proxpath_phenotypes table and create a vector of desired phenotypes. Alternatively, you can perform the analysis on all phenotypes.


data('proxpath_phenotypes')
DT::datatable(proxpath_phenotypes, options = list(pageLength = 3))
=======

Check the available phenotypes from the proxpath_phenotypes table and create a vector of desired phenotypes. Alternatively, you can perform the analysis on all phenotypes.


data('proxpath_phenotypes')
DT::datatable(proxpath_phenotypes, options = list(pageLength = 3))
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

# example phenotypic vector
desired_phenotypes <- c('APOPTOSIS',
                        'CELL_DEATH',  'CELL_GROWTH', 'CELL_CYCLE_BLOCK',
                        'DNA_REPAIR',
                        'PROLIFERATION', 'IMMORTALITY', 'SURVIVAL')

toy_phenoscore_output<- phenoscore_computation(
  proteins_df = toy_opt_network$nodes_df,
  desired_phenotypes = desired_phenotypes,
  pheno_distances_table = NULL, 
  sp_graph = toy_opt_network$igraph_network,
  
  # closeness of proteins to phenotypes
  path_length = 3,
  stat = 'mean',
  zscore_threshold = -1.96,
  
  # exclude random phenotypes
  n_random = 1000,
  pvalue_threshold = 0.05,
  
  # optimized network  specificity
  remove_cascade = TRUE,
  node_idx = FALSE,
  use_carnival_activity = FALSE, 
  create_pheno_network = TRUE)

6.4 Example of Step 3 result

<<<<<<< HEAD

The object returned by phenoscore_computation function contains a barplot to readily interpret the cellular functions modulated by your network.

=======

The object returned by phenoscore_computation function contains a barplot to readily interpret the cellular functions modulated by your network.

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

data('toy_phenoscore_output')
phenoscore_df1 <- toy_phenoscore_output$table_phenotypes
phenoscore_df1 <- phenoscore_df1 %>% 
  dplyr::mutate(reg = ifelse(phenoscore < 0, 'down', 'up'))

  color_list <- list('down' = '#407F7F', 'up' = '#D46A6A')

  ggplot2::ggplot(phenoscore_df1,
                  ggplot2::aes(x = forcats::fct_reorder(EndPathways, 
                                                        phenoscore),
                               y = phenoscore, fill = reg))+
    ggplot2::geom_bar(stat = 'identity', alpha = 0.8)+
    ggplot2::scale_fill_manual(values = color_list, 
                               labels = names(color_list)) +
    ggplot2::ggtitle(paste0('Phenoscore')) +
    ggplot2::ylab("phenotype modulation") +
    ggplot2::xlab("") +
    ggplot2::theme_bw()+
    ggplot2::theme(legend.title = element_blank(),
          legend.position = 'none',
          plot.title=element_text(hjust = 0.5),
          panel.grid.minor = element_blank(),
          axis.text.y = element_text( size = 14, face = 'bold'),
          axis.text.x=element_text(size=10, angle=0, vjust=0.5, hjust=0.5))+
    ggplot2::coord_flip()
<<<<<<< HEAD

=======

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

You can also inspect the proteins in the network regulating INDIRECTLY each phenotype.


sub_table <- toy_phenoscore_output$table_regulators %>% 
  dplyr::select(EndPathways, Effect, regulators) 
DT::datatable(sub_table, options = list(pageLength = 3))
<<<<<<< HEAD
=======
>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

7 Cytoscape Visualization of SignalingProfiler network

<<<<<<< HEAD

To clearly visualize the network, we suggest to optimize the model for the last time on the phenotypic activity. Then you can use a built-in Cytoscape style to render the network. Note: open Cytoscape in your computer before this step!

=======

To clearly visualize the network, we suggest to optimize the model for the last time on the phenotypic activity. Then you can use a built-in Cytoscape style to render the network. Note: open Cytoscape in your computer before this step!

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

solver = 'cplex'
carnival_options <- default_CARNIVAL_options(solver)

opt1 <- optimize_pheno_network(sp_object = toy_phenoscore_output$sp_object_phenotypes,
                               organism = 'human',
                               phospho_df = phospho_toy_df,
                               carnival_options = carnival_options,
                               files = TRUE,
                               direct = TRUE,
                               with_atlas = FALSE,
                               path_sif = './pheno_opt1.sif',
                               path_rds = './pheno_opt1.rds')

opt1_graph <- format_for_visualization(opt1)

7.1 Visualize the entire network


# Visualization in Cytoscape
RCy3::createNetworkFromIgraph(igraph=opt1$sp_object_phenotypes$igraph_network,
                              title = 'My model',
                              collection = 'My collection')
data_path <- system.file("extdata", "SP_pheno_layout.xml", package = "SignalingProfiler")
RCy3::importVisualStyles(filename = data_path)
RCy3::setVisualStyle('SP_pheno_layout')

Examples of SignalingProfiler 2.0 resulting network in NDEX:

7.2 Visualize functional circuits

SignalingProfiler offers the possibily to extract functional circuits from a desired start nodes to a set of end nodes (e.g., phenotypes). k is the max path length between two proteins in the circuit.


opt1_list <- readRDS('./pheno_opt1.rds')

opt1_list$sp_object_phenotypes <- format_for_visualization(opt1_list)

# Circuits on example phenotypes
phenotypes <- c('PROLIFERATION', 'AUTOPHAGY')
start_nodes <- c('MTOR', 'AMPK')

# Circuit on autophagy, ribosome and translation
circuit <- pheno_to_start_circuit(SP_object = opt1_list$sp_object_phenotypes,
                                  start_nodes = start_nodes,
                                  phenotypes = phenotypes,
                                  k = 5,
                                  start_to_top = TRUE)

RCy3::createNetworkFromIgraph(igraph=circuit,
                              title = 'My network',
                              collection = 'Metformin')
RCy3::setVisualStyle('SP_pheno_layout')
<<<<<<< HEAD

Here, we report a functional circuit from the Metformin use case, where MCF7 cell line was treated with Metformin. Signaling Profiler Workflow

=======

Here, we report a functional circuit from the Metformin use case, where MCF7 cell line was treated with Metformin. Signaling Profiler Workflow

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

Conclusion

<<<<<<< HEAD

This is a tutorial for the usage of SignalingProfiler. For any doubt please contact us at: , , .

=======

This is a tutorial for the usage of SignalingProfiler. For any doubt please contact us at: , , .

>>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

R session info

xfun::session_info()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3.1

Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8

Package version:
<<<<<<< HEAD
  annotate_1.84.0          AnnotationDbi_1.68.0     askpass_1.2.1           
  backports_1.5.0          base64enc_0.1.3          bcellViper_1.42.0       
  BH_1.87.0.1              Biobase_2.66.0           BiocGenerics_0.52.0     
  BiocManager_1.30.25      BiocParallel_1.40.0      BiocStyle_2.34.0        
  Biostrings_2.74.1        bit_4.5.0.1              bit64_4.6.0-1           
  blob_1.2.4               bookdown_0.42            broom_1.0.7             
  bslib_0.9.0              cachem_1.1.0             callr_3.7.6             
  CARNIVAL_2.16.0          cellranger_1.1.0         checkmate_2.3.2         
  class_7.3-23             cli_3.6.4                clipr_0.8.0             
  cluster_2.1.8            codetools_0.2.20         colorspace_2.1-1        
  compiler_4.4.2           conflicted_1.2.0         cosmosR_1.14.0          
  cpp11_0.5.1              crayon_1.5.3             crosstalk_1.2.1         
  curl_6.2.0               data.table_1.16.4        DBI_1.2.3               
  dbplyr_2.5.0             decoupleR_2.12.0         digest_0.6.37           
  dorothea_1.18.0          dplyr_1.1.4              DT_0.33                 
  dtplyr_1.3.1             e1071_1.7-16             evaluate_1.0.3          
  fansi_1.0.6              farver_2.1.2             fastmap_1.2.0           
  fontawesome_0.5.3        forcats_1.0.0            formatR_1.14            
  fs_1.6.5                 futile.logger_1.4.3      futile.options_1.0.1    
  gargle_1.5.2             generics_0.1.3           GenomeInfoDb_1.42.3     
  GenomeInfoDbData_1.2.13  ggforce_0.4.2            ggplot2_3.5.1           
  glue_1.8.0               googledrive_2.1.1        googlesheets4_1.1.1     
  graph_1.84.1             graphics_4.4.2           grDevices_4.4.2         
  grid_4.4.2               GSEABase_1.68.0          gtable_0.3.6            
  haven_2.5.4              here_1.0.1               highr_0.11              
  hms_1.1.3                htmltools_0.5.8.1        htmlwidgets_1.6.4       
  httpuv_1.6.15            httr_1.4.7               ids_1.0.1               
  igraph_2.1.4             IRanges_2.40.1           isoband_0.2.7           
  jquerylib_0.1.4          jsonlite_1.8.9           KEGGREST_1.46.0         
  kernlab_0.9-33           KernSmooth_2.23-26       knitr_1.49              
  labeling_0.4.3           lambda.r_1.2.4           later_1.4.1             
  lattice_0.22-6           lazyeval_0.2.2           lifecycle_1.0.4         
  logger_0.4.0             lpSolve_5.6.23           lubridate_1.9.4         
  magick_2.8.5             magrittr_2.0.3           MASS_7.3-64             
  Matrix_1.7-2             memoise_2.0.1            methods_4.4.2           
  mgcv_1.9.1               mime_0.12                mixtools_2.0.0          
  modelr_0.1.11            munsell_0.5.1            nlme_3.1-167            
  OmnipathR_3.14.0         openssl_2.3.2            parallel_4.4.2          
  parallelly_1.42.0        permute_0.9.7            pheatmap_1.0.12         
  pillar_1.10.1            pkgconfig_2.0.3          plogr_0.2.0             
  plotly_4.10.4            plyr_1.8.9               png_0.1-8               
  polyclip_1.10.7          prettyunits_1.2.0        processx_3.8.5          
  progress_1.2.3           promises_1.3.2           proxy_0.4-27            
  ps_1.8.1                 purrr_1.0.4              R.methodsS3_1.8.2       
  R.oo_1.27.0              R.utils_2.12.3           R6_2.6.1                
  ragg_1.3.3               rappdirs_0.3.3           RColorBrewer_1.1.3      
  Rcpp_1.0.14              RcppEigen_0.3.4.0.2      RcppTOML_0.2.2          
  readr_2.1.5              readxl_1.4.3             rematch_2.0.0           
  rematch2_2.1.2           reprex_2.1.1             reticulate_1.42.0       
  rjson_0.2.23             rlang_1.1.5              rmarkdown_2.29          
  rprojroot_2.0.4          RSQLite_2.3.9            rstudioapi_0.17.1       
  RVenn_1.1.0              rvest_1.0.4              S4Vectors_0.44.0        
  sass_0.4.9               scales_1.3.0             segmented_2.1-3         
  selectr_0.4.2            SignalingProfiler_0.99.0 snow_0.4.4              
  splines_4.4.2            stats_4.4.2              stats4_4.4.2            
  stringi_1.8.4            stringr_1.5.1            survival_3.8-3          
  sys_3.4.3                systemfonts_1.2.1        textshaping_1.0.0       
  tibble_3.2.1             tidyr_1.3.1              tidyselect_1.2.1        
  tidyverse_2.0.0          timechange_0.3.0         tinytex_0.54            
  tools_4.4.2              tweenr_2.0.3             tzdb_0.4.0              
  UCSC.utils_1.2.0         utf8_1.2.4               utils_4.4.2             
  uuid_1.2.1               vctrs_0.6.5              vegan_2.6.10            
  viper_1.40.0             viridisLite_0.4.2        visNetwork_2.1.2        
  vroom_1.6.5              withr_3.0.2              xfun_0.50               
  XML_3.99.0.18            xml2_1.3.6               xtable_1.8.4            
  XVector_0.46.0           yaml_2.3.10              zip_2.3.2               
  zlibbioc_1.52.0         
======= abind_1.4.8 ade4_1.7.23 airr_1.5.0 alakazam_1.3.0 annotate_1.84.0 AnnotationDbi_1.68.0 ape_5.8.1 askpass_1.2.1 backports_1.5.0 base64enc_0.1.3 bcellViper_1.42.0 BH_1.87.0.1 Biobase_2.66.0 BiocGenerics_0.52.0 BiocManager_1.30.25 BiocParallel_1.40.0 BiocStyle_2.34.0 Biostrings_2.74.1 bit_4.5.0.1 bit64_4.6.0.1 bitops_1.0.9 blob_1.2.4 bookdown_0.42 boot_1.3.31 broom_1.0.7 bslib_0.9.0 cachem_1.1.0 callr_3.7.6 car_3.1.3 carData_3.0.5 CARNIVAL_2.16.0 cellranger_1.1.0 class_7.3-23 cli_3.6.4 clipr_0.8.0 cluster_2.1.8 codetools_0.2.20 colorspace_2.1-1 compiler_4.4.2 conflicted_1.2.0 corrplot_0.95 cosmosR_1.14.0 cowplot_1.1.3 cpp11_0.5.1 crayon_1.5.3 crosstalk_1.2.1 curl_6.2.0 data.table_1.16.4 data.tree_1.1.0 DBI_1.2.3 dbplyr_2.5.0 decoupleR_2.12.0 DelayedArray_0.32.0 Deriv_4.1.6 digest_0.6.37 doBy_4.6.25 dorothea_1.18.0 dplyr_1.1.4 DT_0.33 dtplyr_1.3.1 e1071_1.7-16 evaluate_1.0.3 fansi_1.0.6 farver_2.1.2 fastmap_1.2.0 fontawesome_0.5.3 forcats_1.0.0 formatR_1.14 Formula_1.2.5 fs_1.6.5 futile.logger_1.4.3 futile.options_1.0.1 gargle_1.5.2 generics_0.1.3 GenomeInfoDb_1.42.3 GenomeInfoDbData_1.2.13 GenomicAlignments_1.42.0 GenomicRanges_1.58.0 ggforce_0.4.2 ggplot2_3.5.1 ggpubr_0.6.0 ggrepel_0.9.6 ggsci_3.2.0 ggsignif_0.6.4 glue_1.8.0 googledrive_2.1.1 googlesheets4_1.1.1 gprofiler2_0.2.3 graph_1.84.1 graphics_4.4.2 grDevices_4.4.2 grid_4.4.2 gridExtra_2.3 GSEABase_1.68.0 gtable_0.3.6 haven_2.5.4 here_1.0.1 highr_0.11 hms_1.1.3 htmltools_0.5.8.1 htmlwidgets_1.6.4 httpuv_1.6.15 httr_1.4.7 ids_1.0.1 igraph_2.1.4 IRanges_2.40.1 isoband_0.2.7 jquerylib_0.1.4 jsonlite_1.8.9 KEGGREST_1.46.0 kernlab_0.9-33 KernSmooth_2.23-26 knitr_1.49 labeling_0.4.3 lambda.r_1.2.4 later_1.4.1 lattice_0.22-6 lazyeval_0.2.2 lifecycle_1.0.4 lme4_1.1.36 lpSolve_5.6.23 lubridate_1.9.4 magick_2.8.5 magrittr_2.0.3 MASS_7.3-64 Matrix_1.7-2 MatrixGenerics_1.18.1 MatrixModels_0.5.3 matrixStats_1.5.0 memoise_2.0.1 methods_4.4.2 mgcv_1.9.1 microbenchmark_1.5.0 mime_0.12 minqa_1.2.8 mixtools_2.0.0 modelr_0.1.11 munsell_0.5.1 networkD3_0.4 nlme_3.1-167 nloptr_2.1.1 nnet_7.3.20 numDeriv_2016.8.1.1 openssl_2.3.2 parallel_4.4.2 parallelly_1.42.0 pbkrtest_0.5.3 permute_0.9.7 pheatmap_1.0.12 pillar_1.10.1 pixmap_0.4.13 pkgconfig_2.0.3 plogr_0.2.0 plotly_4.10.4 plyr_1.8.9 png_0.1-8 polyclip_1.10.7 polynom_1.4.1 prettyunits_1.2.0 processx_3.8.5 progress_1.2.3 promises_1.3.2 proxy_0.4-27 ps_1.8.1 purrr_1.0.4 qdapRegex_0.7.8 quantreg_6.0 R6_2.6.1 ragg_1.3.3 rappdirs_0.3.3 rbibutils_2.3 RColorBrewer_1.1.3 Rcpp_1.0.14 RcppArmadillo_14.2.3.1 RcppEigen_0.3.4.0.2 RcppTOML_0.2.2 RCurl_1.98.1.16 Rdpack_2.6.2 readr_2.1.5 readxl_1.4.3 reformulas_0.4.0 rematch_2.0.0 rematch2_2.1.2 reprex_2.1.1 reticulate_1.40.0 Rhtslib_3.2.0 rjson_0.2.23 rlang_1.1.5 rmarkdown_2.29 rprojroot_2.0.4 Rsamtools_2.22.0 RSQLite_2.3.9 rstatix_0.7.2 rstudioapi_0.17.1 RVenn_1.1.0 rvest_1.0.4 S4Arrays_1.6.0 S4Vectors_0.44.0 sass_0.4.9 scales_1.3.0 segmented_2.1-3 selectr_0.4.2 seqinr_4.2.36 SignalingProfiler_2.1.4 snow_0.4.4 sp_2.2.0 SparseArray_1.6.1 SparseM_1.84.2 splines_4.4.2 stats_4.4.2 stats4_4.4.2 stringi_1.8.4 stringr_1.5.1 SummarizedExperiment_1.36.0 survival_3.8-3 sys_3.4.3 systemfonts_1.2.1 textshaping_1.0.0 tibble_3.2.1 tidyr_1.3.1 tidyselect_1.2.1 tidyverse_2.0.0 timechange_0.3.0 tinytex_0.54 tools_4.4.2 tweenr_2.0.3 tzdb_0.4.0 UCSC.utils_1.2.0 UniprotR_2.4.0 utf8_1.2.4 utils_4.4.2 uuid_1.2.1 vctrs_0.6.5 vegan_2.6.10 viper_1.40.0 viridisLite_0.4.2 visNetwork_2.1.2 vroom_1.6.5 withr_3.0.2 xfun_0.50 XML_3.99.0.18 xml2_1.3.6 xtable_1.8.4 XVector_0.46.0 yaml_2.3.10 zlibbioc_1.52.0 >>>>>>> dc3384fd1d9390421f5c883dabd29d4188c3dc98

References